On Privacy-Preservation of Text and Sparse Binary Data with Sketches
نویسندگان
چکیده
In recent years, privacy preserving data mining has become very important because of the proliferation of large amounts of data on the internet. Many data sets are inherently high dimensional, which are challenging to different privacy preservation algorithms. However, some domains of such data sets also have some special properties which make the use of sketch based techniques particularly useful. In this paper, we present a new method for privacy preserving data mining of text and binary data with the use of a sketch based approach. The special properties of such data sets which are exploited are that of sparsity; according to this property, only a small percentage of the attributes have non-zero values. We formalize an anonymity model for the sketch based approach, and utilize it in order to construct sketch based privacy preserving representations of the original data. This representation allows accurate computation of a number of important data mining primitives such as the dot product. Therefore, it can be used for a variety of data mining algorithms such as clustering and classification. We illustrate the effectiveness of our approach on a number of real and synthetic data sets. We show that the accuracy of data mining algorithms is preserved by the transformation even in the presence of increasing data dimensionality.
منابع مشابه
Privacy and Security of Big Data in THE Cloud
Big data has been arising a growing interest in both scien- tific and industrial fields for its potential value. However, before employing big data technology into massive appli- cations, a basic but also principle topic should be investigated: security and privacy. One of the biggest concerns of big data is privacy. However, the study on big data privacy is still at a very early stage. Many or...
متن کاملPrivacy and Security of Big Data in THE Cloud
Big data has been arising a growing interest in both scien- tific and industrial fields for its potential value. However, before employing big data technology into massive appli- cations, a basic but also principle topic should be investigated: security and privacy. One of the biggest concerns of big data is privacy. However, the study on big data privacy is still at a very early stage. Many or...
متن کاملDensifying One Permutation Hashing via Rotation for Fast Near Neighbor Search
The query complexity of locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size (Indyk & Motwani, 1998). In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams), minwise hashing is widely adopted, which requires applying a large number of permutat...
متن کاملIntroducing an algorithm for use to hide sensitive association rules through perturb technique
Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...
متن کاملOutcomes of Observe privacy in hospitalized patients: qualitative content analysis
Abstract: Introduction: In medical ethics, protecting the privacy of patients is part of patient's rights and the basis of patient care. Since the outcomes of respecting privacy as one of the dimensions of patient's rights are not clear, this study was performed to explore outcomes of respect for patient privacy of patients in hospital. Methods: The study was performed on 20 patients hospit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007